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PROPER LOCAL SCORING RULES 

By Matthew Parry^, A. Philip Dawid and Steffen Lauritzen 

University of Otago, University of Cambridge and University of Oxford 

, We investigate proper scoring rules for continuous distributions 

on the real line. It is known that the log score is the only such rule 
that depends on the quoted density only through its value at the out- 
come that materializes. Here we allow further dependence on a finite 
number m of derivatives of the density at the outcome, and describe 
a large class of such m-local proper scoring rules: these exist for all 
even m but no odd m. We further show that for m > 2 all such m- 
local rules can be computed without knowledge of the normalizing 
constant of the distribution. 

1. Introduction. A scoring rule S{x, Q) is a loss function measuring the 
quality of a quoted distribution Q, for an uncertain quantity X, when the 
realized value of X is x. It is proper if it encourages honesty in the sense that 
the expected score FiXr^pS{X, Q), where X has distribution P, is minimized 
by the choice Q = P. 

Traditionally, a scoring rule has been termed local if it depends on the 
density function q{-) of Q only through its value, q{x), at x. With this defi- 
nition, any proper local scoring rule is equivalent to the log score, S{x, Q) = 
— Ing'(x). However, we can weaken the locality condition by allowing fur- 
ther dependence on a finite number m of derivatives of q{-) at x, and this 
introduces many further possibilities. We term m the order of the rule. 

In this paper we describe a large class of such order-m proper local scoring 
rules for densities on the real line. These turn out to depend on the den- 
sity q{-) in a way that is insensitive to a multiplicative constant, and hence 
can be computed without knowledge of the normalizing constant of q. 

Hyvarinen (2005) proposed a method for approximating a distribution P 
on A" = M'^ by a distribution Q in a specified family V of distributions by 



Received January 2011; revised January 2012. 

^Supported by EPSRC Statistics Mobility Fellowship EP/E009670. 

AMS 2000 subject classifications. Primary 62C99; secondary 62A99. 

Key words and phrases. Bregman score, concavity, divergence, entropy, Euler- 
Lagrange equation, homogeneity, integration by parts, local function, score matching, 
variational methods. 

This is an electronic reprint of the original article published by the 

Institute of Mathematical Statistics in The Annals of Statistics, 

2012, Vol. 40, No. 1, 561-592. This reprint differs from the original in pagination 

and typographic detail. 



1 



2 



M. PARRY, A. P. DAWID AND S. LAURITZEN 



minimizing d{P, Q) over Q ^V, where 

(1) d{P,Q) = ^j dx p{x)\V lnp{x) -V In qix)\'^ 

with V denoting gradient. Since q enters this expression only through Vlng, 
it is clear that the minimization only requires knowledge of q up to a multi- 
plicative factor. Using integration by parts, Hyvarinen (2005) further showed 
that minimization of the divergence d{P,Q) in (1) is equivalent to mimimiz- 
ing 

S{P,Q) = Ep{Alnq{X) + i|Vlng(X)|2} 

[where A denotes the Laplacian operator Yli=i which is a scoring 

rule of the type discussed in this paper: see Section 2.5 below. 

The plan of the paper is as follows. In Section 2 we introduce proper 
scoring rules, with some examples and applications. Section 3 formalizes the 
notion of a local function, its representations and derivatives. In Section 4 we 
apply integration by parts and the calculus of variations to develop a "key 
equation," which is further investigated in Section 5 through an analysis of 
fundamental differential operators associated with local functions. Section 6 
describes the solutions to the key equation, which we term "key local scoring 
rules," in terms of a homogeneous function cp. In Section 7 we point out 
that distinct choices of (p can generate the same scoring rule, and consider 
some implications; in particular, we show that key m-local scoring rules 
exist for any even order m, but for no odd order. Section 9 examines when 
this construction does indeed yield a proper local scoring rule, concavity 
of (j) being crucial. Section 10 devotes further attention to boundary terms 
arising in the integration by parts. In Section 11 we study how the problem 
and its solution transform under an invertible mapping of the sample space, 
and develop an invariant formulation. 

1.1. Related work. In this paper we are concerned with characterizing 
m-local proper scoring rules, for all orders m. Since there are no such rules 
of order 1, order 2 scoring rules constitute the simplest nontrivial case, and 
as such are likely to be the most useful in practice. In a companion paper to 
this one, Ehm and Gneiting (2012) conduct a deep investigation of order 2 
proper local rules, using an elegant construction complementary to ours. 
They also describe a general class of densities for which the boundary terms 
vanish. 

The present paper confines attention to absolutely continuous distribu- 
tions on the real line. The notion of local scoring rule has an interesting 
analogue for a discrete sample space equipped with a given neighborhood 
structure. The theory for that case is developed in an accompanying paper 
[Dawid, Lauritzen and Parry (2012)]; it exhibits both close parallels with, 
and important differences from, the continuous case considered here. 
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2. Scoring rules. Suppose You are required to express Your uncertainty 
about an unobserved quantity X € X hy quoting a distribution Q over X, 
after which Nature will reveal the value x of X. A scoring rule or score S 
[Dawid (1986)] is a special kind of loss function, intended to measure the 
quality of your quote Q in the light of the realized outcome x: S{x, Q) 
is a real number interpreted as the loss You will suffer in this case. The 
principles of Bayesian decision theory [Savage (1954)] now enjoin You to 
minimize Your expected loss. If Your actual beliefs about X are described 
by a probability distribution P, You should thus quote that Q that minimizes 
S{P,Q) :=Ex^pS{X,Q). The scoring rule S is termed proper (relative to 
a class V of distributions over X) when, for any fixed P gV, the minimum 
over Q £V is achieved at Q = P; it is strictly proper when, further, this 
minimum is unique. Thus, under a proper scoring rule, honesty is the best 
policy. 

Associated with any proper scoring rule S are a {generalized) entropy 
function H{P) := S{P,P) and a divergence function d{P,Q) := S{P,Q) — 
H[P). Under suitable technical conditions, proper scoring rules and their 
associated entropy functions and divergence functions enjoy certain proper- 
ties that serve to characterize such "coherent" constructions [Dawid (1998)]: 
S{P, Q) is affine in P and is minimized in Q at Q = P; H[P) is concave in P; 
d{P,Q) — d{P,Qo) is affine in P, and d{P,Q) > 0, with equality achieved at 
Q = P. 

If two scoring rules differ by a function of x only, then they will yield 
the identical divergence function. In this case we will term them equivalent 
[note that this is a more specialized usage than that of Dawid (1998)]. 

A fairly arbitrary statistical decision problem can be reduced to one based 
on a proper scoring rule. Let L:X x A^M be a loss function, defined for 
outcome space X and action space A. Letting be a class of distributions 
over X such that L{P,a) := Ex~p-^^(^, a) exists for all a G ^ and P (zV, 
define, for P,Q &V and x G X, 

(2) S{x,Q):=L{x,aQ), 

where ap := arginfag_4 L(P, a) is a Bayes act with respect to P (supposed 
to exist, and selected arbitrarily if nonunique). Then S is readily seen to be 
a proper scoring rule, and the associated entropy function is just the Bayes 
loss: H{P)=inia&_AL{P,a). 

In this paper we focus attention on the case that X is an interval on 
the real line and any Q gV has a density q{-) with respect to Lebesgue 
measure on X. We may then define S{x,Q) in terms of q. However, since q 
is only defined almost everywhere we must take care that any manipulations 
performed either involve a preferred version of q, or yield the same answer 
when q is changed on a null set. This will always be the case in this paper. 
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2.1. Bregman scoring rule. Since any decision problem generates a proper 
scoring rule there is a very great number of these. Certain forms are of spe- 
cial interest or simplicity. Here we describe one important class of such rules 
for the case that every Q has a density function g(-) with respect to 
a dominating measure ^ over X. 

Let (/>:M^ be concave and differentiable. The associated (separable) 
Bregman scoring rule is defined by 

(3) S{x,Q):=<P'{q{x)}+ [ df,iy)[Hq{y)} - q{y)cl)'{qiy)}]. 



It can be shown that these are the only proper scoring rules having the form 
S{x,Q)=C{q{x)}-k{Q) [Dawid (2007)]. 
Taking expectations, we obtain 

S{P,Q) = I dfi{x) [{p{x)-q{x)}(t>'{q{x)} + ^qix)}]. 

It follows that H{P) = J dfj.{x) (j){p{x)} and so, assuming H{P) is finite, 
the corresponding (separable) Bregman divergence [Bregman (1967), Csiszar 
(1991)] — also termed U -divergence [Eguchi (2008)] — is 

(4) d{P, Q)= [ dfx{x) mQi^)} + {Pi^) - q{x)}(t>'{q{x)}] - <A{p(x)}). 



The integrand is nonnegative by concavity of (p. Therefore, the separable 
Bregman scoring rule is a proper scoring rule, and strictly proper if (p is 
strictly concave. 

2.2. Extended Bregman score. A straightforward generalization of the 
above Bregman construction is obtained on replacing (p : — )> M throughout 
by (j):Xx M+ M, such that, for each x G X, (f){x, •) :]R+ — > M is concave. 
Such extended Bregman rules are the only proper scoring rules of the form 
S{x,Q)=^{x,q{x)}-k{Q) [Dawid (2007)]. 

2.3. Log score. For (j){s) = —sins we ohtain the logarithmic scoring rule, 
or log score, defined by 

S{x,Q) = -lnq{x). 

This is essentially the only scoring rule of the form S{x,Q) = ^{x,q{x)} 
[Bernardo (1979), Dawid (2007)]. For this case we obtain 

H{P) = — J dfi{x) p{x)lnp{x), 

the Shannon entropy, and 

(5) diP,Q)= [ d^(x)p(x)ln^^''^ 

J 



the Kullback-Leibler divergence. 
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2.4. Parameter estimation. Let Q = {Qe} C "P be a smooth parametric 
family of distributions. Given data in X with empirical distri- 
bution P, one way to estimate 9 is by minimizing some divergence criterion: 
9 := argming d{P, Qe). When the divergence function is derived from a scor- 
ing rule, this is equivalent to minimizing the total empirical score: 

n 

(6) 9 = argmm'S^S{xi,Qe), 

in which form it remains meaningful even if P ^V, when d{P, Qg) is unde- 
fined. The corresponding estimating equation is 

n 

(7) ^a{xi,9) = 

1=1 

with a{x,9) := dS{x,Qg)/d9. For a proper scoring rule it is straightforward 
to show that the estimating equation (7) is unbiased [Dawid and Lauritzen 
(2005)] and, as a result, 9 is typically consistent, though not necessarily 
efficient; it may also display some degree of robustness. Equation (7) delivers 
an M -estimator [Huber (1981), Hampel et al. (1986)]. Statistical properties 
of the estimator are considered by Eguchi (2008) for the special case of 
minimum Bregman ([/-) divergence estimation, and readily extend to more 
general cases. 

2.5. Hyvdrinen scoring rule. Hyvarinen (2005) showed that minimiza- 
tion of the divergence d{P, Q) in (1) is equivalent to minimizing the empirical 
score for the scoring rule 

(8) S{x,Q)=Mnq{x) + \\V\uq{x)\^. 

This is valid in the case where Af = M'^ and V consists of distributions P 
whose Lebesgue density p(-) is a twice continuously differentiable function 
of X satisfying V Inp — )• as |x| — )• oo. For A; = 1 we get 

q[x) 2 1^ q[x) 

Dawid and Lauritzen (2005) showed that, with some reinterpretation, 
the formula (8) defines a proper scoring rule in the more general case of 
an outcome space X that is a Riemannian manifold. Now q{-) denotes the 
natural density dQ/dn of Q with respect to the associated volume measure n 
on V denotes natural gradient; A is the Laplace-Beltrami operator; and 
= {u,u) is the squared norm defined by the metric tensor. We impose 
the restriction P,Q £V, where P €V if Vlnp(x) — >■ as x approaches the 
boundary of X. 
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On applying Stokes's theorem (again essentially integration by parts) and 
noting that boundary terms vanish, we can express the expected score as 

S{P,Q) = ^ j dn{x)p{x){V\iiq{x)-2V\np{x),V\nq{x)). 

The entropy is thus H{P) = — ^ J d/i(x)p(x)|Vlnp(x)p and so the associ- 
ated divergence is essentially that used by Hyvarinen: 

(10) d(p,Q) = i j d^f(x)p(x)|Vlnp(x) - Vlng(j;)|2, 

which is nonnegative and vanishes only when Q = P. It follows that the 
scoring rule is strictly proper. 

Although this scoring rule is not local in the strict sense, it depends on 
(x, Q) only through the first and second derivatives of the density func- 
tion q{-) at the point x; it is local of order 2, or 2-local, as defined below in 
Section 3. 

Note that one does not need to know the volume measure /i to calculate 
the divergence; formula (10) for d{P,Q) yields the same result if we take ji 
to be any fixed underlying measure, and interpret p and q as densities with 
respect to this. 

2.6. Homogeneity. An interesting and practically valuable property of 
the generalized Hyvarinen scoring rule is that S{x, Q) given by (8) is ho- 
mogeneous in the density function q(-): it is formally unchanged if q{-) is 
multiplied by a positive constant, and so can be computed even if we only 
know the density function up to a scale factor. In particular, use of the esti- 
mating equation (7) does not require knowledge of the normalizing constant 
(which is often hard to obtain) for densities in Q. 

Example 2.1. Consider the natural exponential family: 

q{x\e) = Z{ey^ exp{a(x) + ex}. 

Using the scoring rule (9) we obtain S{x,Qg) = a"{x) + |{a'(x) -|- 0}^, so 
that a{x,9) = a'{x) + 6, and (7) delivers the unbiased estimator 

n 

9 = -^a'{Xi)/n, 
1=1 

which can be computed without knowledge of Z(9). See also Section 4 of 
Hyvarinen (2007), where exponential families are discussed. 

Alternatively, we can work directly with the sufficient statistic T := Y17=i 
which has density of the form 

qT{t\d)=Z{e)-'^eM(^n{t) + Ot}. 
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Applying the above method to qt leads to the unbiased estimator 6 = —a[^{T) 
This is the maximum plausibility estimator of 9 [Barndorff-Nielsen (1976)]. 
Basing the estimate on the sufficient statistic is more satisfying and better 
behaved from a principled point of view, but does require computation of 
the function an{t)^ which involves an n-fold convolution of e"'^^\ 

As an application, suppose Qq is obtained from the normal distribu- 
tion N{6,1) by retaining its outcome x with probability k{x). We assume 
that k{x) is everywhere positive and twice differentiable. The density is thus 

^ /c(x)exp-{(x-g)V2} 
'^^ ' ^ Jk{y)exp-{iy-9)y2}dy- 

Because of the complex dependence of the denominator on 0, the maximum 
likelihood estimate typically cannot be expressed in closed form. However, 
using scoring rule (9) yields the explicit unbiased estimator 

n 

^ = '^{^1 -K'{xi)}/n 
i=l 

with k{x) :=ln/c(x). 

The homogeneity property will be a feature of all the new proper local 
scoring rules we introduce here: see Section 6. 

3. Local scoring rules. We observed in Section 2.3 that the log score 
S{x,Q) = — Ing(x) is essentially the only proper scoring rule that is local, 
that is, involves the density function g(-) of Q only through its value, q{x), 
at the actually realized value x X . 

We can, however, weaken the locality requirement, for example, by al- 
lowing S to depend on the values of q{-) in an infinitesimal neighborhood 
of x. In this paper we describe a class of scoring rules that depend on the 
function q{-) only through its value and the values of a finite number m 
of its derivatives at the point x — a property we will refer to as locality of 
order m, or m-locality. 

We confine ourselves to the case that X is an open interval in M, pos- 
sibly infinite or semi-infinite, and P is a class of distributions Q over X 
having strictly positive Lebesgue density q{-) that is m-times continuously 
differentiable. 

3.1. Local functions and scoring rules. To study the properties of local 
scoring rules we need a formal definition of a local function. 

Definition 3.1. A function F-.XxV^Ris said to be local of order m, 
or m-local, if it can be expressed in the form 

F{x,Q) = f{x,q{x),q'{x),q"{x),...,q^^Hx)}, 
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where /: x R, with Qm := x M™, is a real- valued infinitely 

differentiable function, q{-) is the density function of Q, and a prime (') 
denotes differentiation with respect to x. It is local if it is local of some 
finite order. 

We shall refer to such a function / as a q- function, and say it is of order m. 
When we do not need to specify the order m of a (/-function / we may write 
/(x,q) {xeA:, qSQ), understanding Q = Qr„, q = (go, ■ • ■ , gm.)- 

A scoring rule S{x, Q) is m-local if 

(11) Six, Q) = s{x, q{x),q'{x),(i'{x), q^'^^x)}, 

where s is a g-function as above, so that it depends on the quoted distribu- 
tion Q for X only through the value and derivatives up to order m of the 
density q{-) of Q, evaluated at the observed value x oi X. The function s is 
the score function of S. 

3.2. Differentiation of local functions. For a local scoring rule 5 given 
by (11) we write 

S[j]{x, Q) := sy]{x, q{x),q'{x),q"{x), q^'^Xx)}, 
where syj ■.= ds/dqj, and similarly 

Six]{x, Q) ■■= si^]{x,q{x), q^'^^x)}, 

where s^^] '■= ds/dx. Then if dS/dx denotes the derivative of S{x,Q) with 
respect to x for fixed Q, we have 

dS 

dx 

i>o 

For S of order m, the series in (12) terminates at j = m. 

Motivated by (12), we introduce a linear differential operator D acting 
on g-functions by 

For / of order m, the series for Df obtained from (13) terminates at j = m, 
and Df is then of order m + 1 . 

The operator D thus represents the total derivative of the local function 
for fixed Q: 

'^ = {Ds){x,q{x),...,q^^+^\x)}, 

where s is the score function of S. 

In the light of the interpretation of D as d/dx, the following result is 
unsurprising: 



(12) — = {x,Q) + Y, Q^'"-'^ {^)S[j] {x,Q). 
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Lemma 3.1. For a q-function f, Df = if and only if f is constant. 

Proof. "If" is trivial. For "only if," suppose / is of order < m. The 
only term in Df involving qm+i is qm+if[m], so that Df = 0=;^ /[^j = 0, 
whence / is of order at most m — 1. Repeating this argument, / must be of 
order 0, that is, of the form /(x). Then = Df = f'{x), so finally / must 
be a constant. □ 

4. Variational analysis. We are interested in constructing proper local 
scoring rules. Ideally we would develop sufficient conditions on the score 
function s and the family V to ensure that, for any P G "P, S{P, Q) = 
J dxp{x)s{x,q{x), q' (x) , q" (x) , . . . , q^'^\x)} is minimized, over QeV, at 
Q = P. Initially, however, we shall merely develop, in a somewhat heuris- 
tic fashion, conditions sufficient to ensure that, for all P G P, Q = P will be 
a stationary point of 5(P, Q) — a property we shall describe by saying that 5 
is a weakly proper scoring rule. Given any S satisfying these conditions, fur- 
ther attention will be required to check whether or not it is in fact proper; 
this will be taken up in Section 9 below. 

To address this problem we adopt the methods of variational calculus 
[Troutman (1983), van Brunt (2004)]. Suppose that, at Q = P, S{P,Q) is 
stationary under an arbitrary infinitesimal variation 5q{-) of q{-), subject to 
the requirement that q{-) + 6q{-) be a probability density. That is. 



q=p 



6<^J dxp{x)s{x,q{x),q'{x),q"{x),. . . ,q^"^\x)} + Xp J dxq{x] 

(14) 

= 0, 

where Ap is a Lagrange multiplier for the normalization constraint J dx q{x) 
1. The left-hand side of (14), evaluated with P = Q, is 

(15) / I f; 5q^^^ {x)qix)S[k] (x, Q) + Aq5(?(. 

I fc=0 

and this is to vanish for arbitrary infinitesimal Sq{-) and suitable Ag. 

We evaluate the integral of the A;th term of the sum in (15) using the 
general formula for repeated integration by parts: 

{-!)'' dx FG'-''^ 

(16) 

= / dxGpC^) - ^(-i)'=-i-^-{G('=-^-'^)p('^)}|l, 

where F^^^ denotes the kth derivative of F with respect to x. The first term 
on the right-hand side of (16) is the integral term; the remaining terms are 
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boundary terms, these being evaluated, if necessary, as limits as we approach, 
from within, the end-points (denoted by — and +) of the interval A:" C M. 
Setting G = q{x)S[^{x,Q), F = 5q{x), we obtain 



dxq{x)S^k\{x,Q)5q^^\x) 

(17) = p dx{-l)Hq{x)^{q{x)S^^^{x,Q)} 

+ Xi(-l)'-'-'^^^{'?(x)5[fc](x,Q)}5gW(x)|l. 

r=0 

At this point we restrict consideration to functions 5q whose deriva- 
tives vanish sufficiently quickly at the end-points that we can suppose the 
boundary terms in the last line of (17) vanish. Then (15) will vanish for all 
such Sq{-) if 

(18) {q{x)S^k]{xm ^ Xq, 

that is, the left-hand side of (18) is a constant, independent of x. 

Motivated by (18), we introduce the following linear differential opera- 
tor L on q-functions: 

(19) L:=^(-l)'=+il)%A. 

fc>0 

Unless overridden by parentheses, operators here and elsewhere associate to 
the right, so that Tqof means T(go/), that is, we have 

fc>0 ^ ^ 

For / of order m, the series in (20) terminates at k = m, and the order of Lf 
is at most 2m. 

We can now write (18) as 

(21) Ls = Xq, 

where equality in (21) is required to hold for all {x,qo,qi, . . . , q2m) such that 
qj = q^^\x) {j = 0,. . . ,2m). In particular, a sufficient condition that S be 
weakly proper is that for some A € M we have 

(22) Ls = X 

for all X € Af , q G Q2m.- 

So long as V is sufhciently large, the form (22) will also be necessary for 
(21) to hold. In particular, suppose we impose the following condition on V: 
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Condition 4.1. Given distinct xi,X2 G X, and any qi,q2 € Q2m, there 
exists Q satisfying q^^\xi) = qij, q^^\x2) = q2,j (j = 0,. . . , 2m). 

Take arbitrary Qi,Q2 € "P, xi 7^ X2 € X, and set qij := qf\xi) [i = 1,2; 
j=0, ...,2m). Let Q be as given by Condition 4.1. Evaluating (21) at 
(xi,qi) yields Aqj = Ag, and similarly Aqj = Ag. Thus Aq cannot depend 
on Q, and so (22) must hold. Moreover, taking xi = x, qi = q, and X2, q2 
arbitrary, this must hold for any x € A', q G Q2m- 

So we henceforth restrict attention to solutions of (22). We note that 
a particular solution of (22) is given by the log-score: 

s = —A In go- 
Since L is a linear operator, the general solution is of the form 

s = -Xlnqo + so, 
where sq satisfies the key equation: 
(23) Lsq = 0. 

Because of this we shall confine attention to solutions of the key equation, 
and shall term any solution of (23) a key local score function. 



4.1. Connection to classical calculus of variations. Because the Lagrange 
multiplier A associated with a key local scoring rule s vanishes, setting q{x) = 
p{x) will in fact deliver a globally stationary point [i.e., without imposing 
the normalization constraint J dxq{x) = 1] of the corresponding expected 
score 

J dxp{x)s{x,q{x),q'{x),q"{x), . . . ,q^"^\x)}. 

The classical calculus of variations — see, for example, van Brunt [(2004), 
equations (2.9), (3.3)] — would (again, ignoring the boundary terms) identify 
the solution to this unconstrained variational problem in g(-) as solving the 
Euler-Lagrange equation: 

(24) Apos(x,go,---,gm) =0, 
where A is the Lagrange operator: 

(25) A:=E(-1)'0'|^. 

fc>0 ^ 

We want the solution of (24) to be q = p. 

Now when evaluated at q = p, Agos = s + ApoS- So q = p should satisfy 

(26) (I - A(?o)s = 0, 
where / is the identity operator. 
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But 




(i>o) 



(i = 0) 



so we have 



(27) 



I -Aqoo. 



(Here and throughout, for g a g-function, go denotes the multiphcation 
operator / i— > gf, the optional symbol o being attached, where required, to 
avoid confusion with the g- function g itself.) 

Hence (26) becomes Ls = 0, so recovering the key equation (23). 

5. Properties of differential operators. For a further study of the key 
equation and its properties we shall have a detailed look at the differential 
operators introduced earlier, together with some new ones. We recall 



Lemma 5.1. The Lagrange operator A annihilates the total derivative 
operator D: 






{I-L)q^'o. 



(28) 



AD = 0. 



Proof. Using {d/dqk)D = D{d/dqk) + {d/dq^-i) we have 




= 0. 



□ 



We now introduce the Euler operator: 



(29) 
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Lemma 5.2. The Euler operator E commutes with D and with L, while 
(30) AE = EA + A. 

Proof. From the easily verified relations 

EqjO = qjo + qjE, 

(31) 

{d/dqk)E = Eid/dqk) + {d/dqk), 

it readily follows that E commutes with qj{d/dqk). Since clearly E commutes 
with d/dx, E thus commutes with D, and consequently with any power of D. 
From (19), we now see that E commutes with L. 

Now (27) gives that E commutes with Ac/qo, and thus, noting from (31) 
that qoEq^^o = E — I, we have 

EA = EAqoq^^o = AqoEq^^o = A{E - I) = AE - A, 

which yields (30). □ 

Theorem 5.3. We have that AE = AqoA. 

Proof. Using qi^D = Dq^o — qk+io, we can readily show by induction 
that, for A; > 0, 

It now follows from (28) that AqgD^ = {—l)^AqkO. Applying this term-by- 
term to (25) we obtain AggA = Ylik^1k{d / dqu) = AE. □ 

For later purposes we introduce, for any integer r, 

(32) Br:= (_i)fc-i-''Dfc-i-'- A 

k>r+l 

with the understanding d/dq^ = if A: < (in particular, B-i = A). We 
further define 

(33) C:=YqrBr. 

r>0 

Lemma 5.4. It holds that 

(34) DBr = {d/dqr)-Br-i, 

(35) BrD = {d/dqr). 
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Proof. Equation (34) follows easily from the definition (32), while (35) 
can be proved in the same way as Lemma 5.1. □ 

Theorem 5.5. We have 



Proof. Equation (36) follows directly from (33) and (35). From (34), 
DqrBj. = Qr+iBr - qrBj—i + qr{d/dqr), and thus DC = ^r>o^r{d/dqr) - 
qoB^l =E- qoA. □ 

6. Homogeneous scoring rules. We shall see that all key local score func- 
tions, that is, solutions to the key equation Ls = 0, are homogeneous in the 
sense that changing q by a multiplicative factor does not change the value 
of s; hence the associated scoring rule S{x,Q) only involves the density q 
up to a constant factor. We shall formalize and show this below. 

Definition 6.1. A q-function / is said to be homogeneous of degree h, 
or h -homogeneous^ if, for any A > 0, /(x, Aq) = X^f{x,q). 

With E defined by (29), Euler's homogeneous function theorem implies 
that a (/-function / is /i-homogeneous if and only 



It follows that, if / is /i-homogeneous, then so is Df. 

In this work we shall only need to deal with homogeneity of degree 0, 
where f{x, Aq) = /(x, q), and of degree 1, where f{x, Aq) = A/(x, q). Clearly, 
/ is 0-homogeneous if and only if qof is 1-homogeneous. 

A scoring rule S will be called homogeneous if its score function s is 
0-homogeneous. As already noted, the logarithmic score is 0-local, but is 
not homogeneous. The Hyvarinen scoring rule and its generalizations, as 
described in Section 2.5, are 2-local, and are homogeneous. 

We can now easily show that key local score functions are homogeneous: 

Theorem 6.1. If Lf = 0, then f is 0-homogeneous. 

Proof. In this case f = {I — L)f = Aqof and thus from (30) and The- 
orem 5.3 we get 



Ef = EAqof = AEqof - Aqof = Aqo{Aqof) - Aqof = Aqof - Aqof = 



(36) 
(37) 



CD = E 



DC = E- qoA. 




as required. □ 
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We can further show that, if we consider the restriction of the operator L 
to 0-homogeneous functions, it acts as a projection operator] I — L is then 
the complementary projection. This is a consequence of the fact that L is 
idempotent when restricted to 0-homogeneous functions: 

Theorem 6.2. /// is 0-homogeneous, then so is Lf, and L^f = Lf. 

Proof. Since E commutes with L, if / is 0-homogeneous we have 
ELf = LEf = 0, so Lf is 0-homogeneous as weh. If / is 0-homogeneous, 
qof is 1-homogeneous and thus by (38) and Theorem 5.3 

Aqof = AEqof = Ago Ago/, 

so I — L = Aqqo is idempotent when restricted to 0-homogeneous functions, 
whence so is L. □ 

Elaborating the consequences of these results we get: 

Corollary 6.3. We have that: 

(i) Lf = if and only if f = [I — L)g for some {) -homogeneous g; equiv- 
alently, f = A0 for some 1-homogeneous 4>; 

(ii) if f is 0-homogeneous, then {I — L)f = if and only if f = Lg for 
some 0-homogeneous g. 

Proof. If Lf = 0, then / is 0-homogeneous by Theorem 6.1. The other 
properties are easy consequences of the fact that L and I — L are comple- 
mentary projections in the space of 0-homogeneous functions. □ 

Collecting everything, we have the following main result: 

Theorem 6.4. A q-function s is a key local score function if and only 
if any one (and then all) of the following conditions holds: 

(i) The function s satisfies the key equation Ls = 0, where the operator L 
is given by (19). 

(ii) We can express s = {I — L)g where g is a 0-homogeneous q-function. 

(iii) We can express s = A(f) where (p is a 1-homogeneous q-function and 
the operator A is given by (25). 

Moreover, s is then 0-homogeneous. 

When (ii) above holds, we say that s is derived from g; when (iii) holds, 
we say that s is generated by (p. The key local score function generated by 
a 1-homogeneous g-function cj) of order t is thus 

t 

s(x,q) = ^(-l)^Z)V[fc](^,q). 

k=0 
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The only term in s that involves q2t is (— l)V[tt]'?2t- In particular, if cp^^fj ^ 0, 
s is of exact order 2t. Hence we have demonstrated the existence of key local 
scoring rules of all positive even orders. 

The key local scoring rule S generated by cp is then 

* jfc 

(39) S{x, Q) = Y.{-lf ^ ./.[fc] {x, q{x),q'{x), (x)}. 

k=0 

For the case t = 1 we obtain a second-order rule: 

S{x,Q)=(l)[o]{x,q{x),q'{x)} - (j)[^{x , q{x) , q' (x)} , 

where 4'{x,qQ,qi) is 1-homogeneous. 

The Hyvarinen scoring rule (9) is generated in this way hy (f> = — ^qf/qo- 
More generally, choosing (j)= —qi/qQ~^ {k > 1) yields 

(40) S{x,Q) = {k-l){y1 + kyt%), 

where yi := {d^ /dx^)\nq{x). We can express a general 1-homogeneous x- 
independent q'-function of order 1 as a power series: 

(41) <?^'(go,gi) = QoX]"'=(9i/^o)''- 

k>l 

Now combining the rules (40) arising from the individual terms in (41), we 
obtain the series form of a general x-independent second-order scoring rule 
described by Ehm and Gneiting (2010). 

7. Gauge transformation. The map i— )> s = A(j) in Theorem 6.4(iii) is 
many-to-one: two 1-homogeneous functions and (j)2 will generate the iden- 
tical score function s = A(j)i = A(j)2 if and only if A(<^2 — 4'i) = 0. And this 
will hold if and only if 02 — (pi has the total derivative form Dtp: 

Lemma 7.1. Suppose (p is 1-homogeneous. Then Kcp = if and only if (p 
has the form Dip. 

Proof. If (p = D\p, then A(p = by (28). Conversely, suppose (p is 1- 
homogeneous and A(f> = 0. Then qQ^cpis 0-homogeneous and {I-L)qr;^(p = 0, 
so by Corollary 6.3(ii) there exists 0-homogeneous g such that q^ (p = Lg. 
Now take ip = Cqog, with C given by (33). Then, using (37), Dtp = {E — 
qoA)qQg = (/ — qQA)qog, since Eqog = qog because qog is 1-homogeneous; 
and this is qo{I — AqQo)g = q^Lg = (p. □ 

Borrowing terminology from physics, we term a transformation of the 
form (p ^ (p + Dtp a gauge transformation; the invariance of s under such 
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a transformation of (j) is gauge invariance. The choice of a particular func- 
tion (/), out of the equivalence class of functions differing only by a total 
derivative Dip and thus generating the same scoring rule, is a gauge choice. 

Clearly if (j)2 — (pi^ Dtp and both (pi and (p2 are 1-homogeneous, then Dip 
must be 1-homogeneous. This will be so if ip is itself 1-homogeneous. The 
converse also essentially holds: 

Lemma 7.2. Suppose Dtp is 1-homogeneous. Then, for some constant a, 
ip -\- a is 1-homogeneous. 

Proof. We have EDip = Dip. Since by Lemma 5.2 D commutes with 
E, D[Eip — Ip) = Q. Thus by Lemma 3.1 Etp — -0 is a constant, a say. Then 
E{ip + a) = Eip = ip + a., so ip + a is 1-homogeneous. □ 

Since the addition of a constant has no consequences for the analysis, we 
henceforth call a transformation (p (p -\- k a gauge transformation if and 
only if K has the form Dip with ip 1-homogeneous. 

7.1. Standard gauge choice. For any key local score function s we note 
that 



and hence (p = qqs is a valid gauge choice for s. We call (42) the standard 
gauge choice. 

7.2. Equivalence. Suppose s is generated by (p, and let cp* = <p-\- x with 
X = a{x)qo. This is not a gauge transformation if a ^0, but the score 
function it generates, s* = s + a{x), is equivalent to s — which we describe 
by saying cp* and cp are equivalent. Conversely, if cp^ generates s-\-a{x) it must 
be a gauge transformation of (p* , and hence of the form (p -\- a(x)qQ -\- Dip — 
this form thus being necessary and sufficient for equivalence. We note in 
particular that (p^ of the form (p + J2k>o^k{x)qk is equivalent to (p, since it 

generates = s -|- Z]fc>o(~l)^"fc'^''(^)- 

7.3. Nonexistence of odd-order key local scores. In Section 6 we estab- 
lished the existence of key local score functions of all positive even orders. 
Here we show that no key local score function can be of odd order. 

Take s = A(p as in Theorem 6.4(iii), and suppose s has odd order. If (p is 
of order t, the order of s is at most 2t; since it is odd, it must be strictly 
less than 2t. Again, the only term in s that could possibly involve q2t is 



(42) 



(p = qos 



satisfies 



A(p = Aqos = {I - L)s = s 
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Now define 

rit-i 

(43) 'il^{x,qo,...,qt-i) := - A{x,qo, . . . ,qt-2, z) dz 

Jo 

[for the case t = l the integrand on the right-hand side is A{x, z)]. It is easy 
to see that is 1-homogeneous, and 

(44) (/.[t]+V[t-i] =0. 

Let (j)* = <p + Djp, which is of order at most t. Since this is a gauge trans- 
formation, (f)* generates the same scoring rule s as does. But from (44), 
'^[t] ~ 'f'[t] ~^ ^[t-i] ~ ^' ®° ^^^^ ™ order at most t — 1, whence s is 

of order at most 2t — 2. We can now repeat the argument, stepping down t 
by 1 each time, until we reach a contradiction. 

7.4. Second-order rule. A similar argument to the above shows that, for 
any key local scoring rule of exact even order 2t, there exists a gauge choice 
of exact order t. 

A second-order rule can thus always be generated by a 1-homogeneous <j) of 
order 1. However, a change of gauge may increase the order of the generating 
function — for example, the standard gauge choice has order 2. 

If (pi and (p2 are both gauge choices of order 1, then their difference is of 
order 1 and has the form Dtp for some 1-homogeneous t{j. Then ip must be of 
order 0, and hence of the form ip = c{x)qo. It follows that an order-1 gauge 
choice is determined up to an additive term of the form c'{x)qQ + c{x)qi. 
More generally, by Section 7.2 two 1-homogeneous functions (f>i and 02 of 
order 1 are equivalent if their difference has the linear form aQ{x)qQ + ai{x)qi; 
and this is also necessary, since, again by Section 7.2, (j)2 must then have the 
form (pi + a{x)qQ + c'{x)qo + c{x)qi. 

8. Decomposition. The variational analysis has identified the form (39), 
where cp is 1-homogeneous, for a key local scoring rule. We now consider the 
properties of such a rule in more detail. 

Starting from (39), we compute the expected score, 

S{P,Q) = 1^ dxp{x)S{x,Q) 

= 2^(-l)^ / dxp{x)^(P[k]{x,q{x),q'{x),...,q'^'\x)} 

by evaluating the kth term in the sum using the integration by parts for- 
mula (16). Collecting terms, we obtain 

(45) 5(P, Q) = So{P, Q) + S+{P, Q) + 5_ (P, Q), 
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where the integral expected score Sq is given by 

(46) So{P,Q)= r dxY, Pk<P[k] (q) 

" k 

and 

(47) S±{P,Q) = TSb{p,q)\±, 
where the boundary expected score Sb is given by 

(48) Ship,q):=^PrBrHq) 

r>0 

with Bf defined by (32). In these formulas the dependence on x has been 
suppressed from the notation for simphcity, and we interpret pk :=p^^\x), 
Qk ■.= q^^\x). 

Correspondingly, the entropy H{Q) = S{Q, Q) can be decomposed: 

(49) H{Q) = Ha{Q) + H+{Q) + H.{Q) 
with integral entropy 

(50) Ho{Q):= dxY,Qk'P[k]{q)= dxcP{q), 

where the last equality follows from Euler's theorem (38); and H±[Q) = 
=F-ffb(q)|±) where the boundary entropy -ffb(q) satisfies 

i/6(q) = Sb(q,q) 

= cm 

with the operator C defined by (33). 
The divergence now becomes 

(51) d{P, Q) = do{P, Q) + d+{P, Q) + d_ (P, Q), 

where do{P,Q) = So{P,Q) — Hq{P), etc. In particular, the boundary terms 
arise from the boundary divergence 

(52) db{P, Q) = Y,PrBr{M) - </)(?)} 

r 

[where the final term involves substituting p for q after computing i?r0(q)]; 
while, using (38), the integral divergence can be written as 



(53) do(P,Q) 



j dx |</'(q)+^(pfc-gfc)'/'[fe](q) I 



It is easily seen that both do and dh are unchanged by an equivalence 
transformation = + Ylik>Q '^k{x)qk- 



20 M. PARRY, A. P. DAWID AND S. LAURITZEN 

8.1. Change of gauge. Although a key local scoring rule S is unchanged 
by a gauge transformation, the decompositions (45), (49) and (51), and in 
particular the expression (53) for d^, typically do change, terms being redis- 
tributed between their constituents. Indeed, if we replace the generating (p 
by an alternative gauge choice 

(54) (I)* = (j) + Di;, 

applying (46) yields 

S*o{P,Q) = SoiP,Q) + J 

with 

-+ 



J:=l_ d.X:K{i|^OV.(q)}. 



k 

Using {d/dqk)D = D{d / dqk) + d / dqt-i, and the interpretation of D as d/dx, 
this reduces to 



/■+ d 
•^ = y_ dx — ^pfcV[fc](q) 



— 5*4- + S— , 

where S+ := S(p,q)| + , S_ := -S(p,q)|_, with 5(p,q) := EfcPfcV'[fc](q)- 

Similarly, from (47) and (48) we find the boundary expected score trans- 
forming as 

'S'6(p,q) = Sb{p,ci) +J2PkBkDilj 

(55) 

= 'S'6(p,q) + 'S'(p,q) 

on using (35). The changes to the boundary terms thus compensate exactly 
(as they must) for the changes to the integral term. 
We now have 

H*o{P) = Ho{P) + H+ + H- 

with H± := ±H\-i- and H{p) = V'(p)j this follows from (36) since ip is 1- 
homogeneous. Correspondingly the boundary entropy transforms as -ff^(p) ~ 

It is notable that there is always a gauge choice for which the boundary 
entropy vanishes. Specifically: 

Theorem 8.1. Let s be a key local score function. Then for the standard 
gauge choice (j) = qQS, the boundary entropy function Hi, is identically 0. 

Proof. From (37), DHf, = DC(p = E(f) — qQA(f). Since (p is 1-homogeneous 
and s = A0, this becomes cp — qos = 0. So = CDH^ = EHb by (36). But 
EHb = Hfj since Hf, is 1-homogeneous. □ 
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The effect on a gauge transformation on tlie decomposition of the diver- 
gence is 

(56) d*o{P,Q) = do{P,Q) + d+ + d_, 

where d± := ±d\± with 

k 

(57) 

= V'(q) + ^{pk- qk)i^[k] (q) - V'(p) 

k 

and with a compensating change to the boundary divergence db- 

9. Propriety. In this section we investigate the propriety of a key local 
scoring rule S. The scoring rule 5 will be proper if and only if d{P, Q) >0 for 
all P, Q (zV. Clearly it is sufficient to require nonnegativity of each term in 
the right-hand side of the decomposition (51), and we proceed on this basis. 
We investigate d+ and d^ in Section 10 below; here we consider the integral 
term do. 

We note the similarity between formula (53) and that for the Bregman 
divergence (4) (especially where that is extended, as in Section 2.2, to allow 
further dependence of (p on x). Correspondingly, concavity of the defining 
function plays a crucial role here, too. 

Definition 9.1. We call a 1-homogeneous g-function (j){x,q) concave 
if, for every x G A", qi,q2 € Q, 

(58) (/)(x,qi +q2) < 0(x,qi) + (?:)(x,q2) 

(this is readily seen to be equivalent to the usual definition of concavity in q, 
for each x); and strictly concave if strict inequality in (58) holds whenever 
the vectors qi and q2 are linearly independent. 

Theorem 9.1. Suppose that the scoring rule S is generated by a concave 
1-homogeneous q-function (p. Then dQ{P,Q), as given by (53), is nonnega- 
tive. Further, if (j) is strictly concave, then do{P, Q) = if and only if Q = P. 

Proof. Concavity implies that the integrand of (53) is nonnegative for 
each x; under strict concavity it will be strictly positive with positive prob- 
ability when Q ^ P. □ 

Corollary 9.2. Suppose the conditions of Theorem 9.1 apply, and 
the boundary terms d+{P,Q) and d-{P,Q) in (51) vanish identically for 
P,Q €V. Then the (local, homogeneous) scoring rule (39) is proper (strictly 
proper if (p is strictly concave). 
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9.1. Checking concavity. Given a 1-homogeneous (/-function (p of order m, 
define, for u = {ui, . . . ,Um) S M™, 

(59) <^{x,u) :=(p{x,l,u). 
Then (/)(x,q) is determined by ^: 

(60) (l){x,ci) = qo^ix,u) 

with Ui = Qi/qo {i > 1). It is often easier to check concavity for <I> than for (j), 
and this is enough: 

Lemma 9.3. <^ is concave in u if and only if (j) is concave in q. 

Proof. "If" follows immediately from (59). Conversely, if is concave, 

(pix, p + q) = (po + qo)^ X, — ■ \ ■ 

V Po + qo po po + qoqoJ 

>P0<i>[x, — ] +qo<^>[x, — ] 

V poj V qoj 

= (p{x,p) + cl){x,q). □ 

It is further easy to see that $ is strictly concave in u, in the usual sense, 
if and only if (j) is strictly concave in q in the sense of Definition 9.1. 

9.2. Change of gauge. Even if the initial gauge choice (p is concave in q, 
so that do{p,q) > 0, under a gauge transformation (54) the term (i(p,q), as 
given by (57), means that the gauge-transformed integral divergence term d^, 
given by (56), need not be nonnegative; this would hold if the resulting gauge 
choice (j)* were itself concave, but typically this will not be so. 

Note that if ■0 in (54) is concave, then d{p,q) > 0. However, this does not 
ensure positivity of both additional terms, since while the added term d+ = 
(i(p,q)|+ will then be nonnegative, the other added term d- = —d{p,q)\^ 
will be nonpositive. 

Example 9.1. The Hyvarinen scoring rule (9) on A" = M is generated by 
the strictly concave g- function 4> = —^qf/qo- Using this gauge choice in (53) 
yields [cf. (1)] 

(61) doiP,Q) = ^ j dxp{x)ivi-ui)^ 

with Ui := (x) /q{x), Vi := p^^^ (x) /p{x) . 

Alternatively we might use the standard gauge choice, q2 — ^qf/qo, which 
is also strictly concave, and indeed yields the same expression (61). 

Now let tp := -igi ln(gi/go), so that Dtp = -^{92 ln(gi/go) + 92 - ^i/go}- 
Then (p* = (p + Dip = —^q2{l + ln(gi/go)} is another possible gauge choice, 
generating the identical scoring rule S. However, (p* is not concave, and the 
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integral divergence term (53) associated with 0* is 

doiP, Q) = \j d^^'(^) (l - ^) + ^2 In ^ I , 

which is not nonnegative. In this case the extra terms in (56) arise from 
^^(P>q) = \pq{ui -vi+vi ln(vi/ui)}. 

In the light of the above example it might be conjectured that, if s can 
be generated from some concave gauge choice, then the standard gauge 
choice (p = qqs will be concave — equivalently, from Lemma 9.3, s itself will 
be a concave function of the Ui = Qi/qo (i > 1) — but this need not hold: 

Example 9.2. Take $ = —uf in (60). Then $, and hence cp, is concave, 
but s = 12u1u2 — 9uf is not concave. 

10. Boundary issues. The boundary divergence terms in (51) are d±{P, 
Q) = T'^6(P5 q)|±) where di, is given by (52). Their behavior will depend on 
the family V of distributions under consideration, and specifically on the 
behavior, at the end-points + and — , of the densities of distributions in V. 

For propriety of these terms, we want di,{p,q) to be positive at the lower 
end-point — , and negative at the upper end-point -|-, for all P,Q (zV. For 
simplicity we might impose conditions on V sufficient to ensure that, for 
all densities p(-), q{-) G V, dh{p,q} vanishes at the end-points. A family V 
having this property may be termed valid (with respect to the generating 
function <j)). However, there does not appear to be a natural choice for such 
a valid class V. In particular, if V and V' are both valid families, it does not 
follow that their union will be. 

Note that the validity requirement depends on the gauge choice (j), and 
a change of gauge could assist in ensuring that it holds. 

For the special case of the standard gauge choice, cp* = qqs, we know 
from Theorem 8.1 that the boundary entropy vanishes. // the boundary 
quantities 5^, H^, d"^ behaved like regular quantities S, H, d we could 
deduce dl = [Dawid (1998)]; but this is a big "if," and the result will not 
hold without imposing further conditions. 

10.1. Second-order rules. For a second-order rule with 1-homogeneous 
generator ^(x, go, 9i), we find 

4=Po{<^[i](q) -</>[!](?)}• 
Alternatively, the standard gauge choice is (p* = (p + Dtp with ip = —Ccp = 
— go</'[i]- From Section 8.1, we find 

t^bCP-q) = Sl{p,q) = -qo{vo4>[Qi] +Pi0[ii])- 
That this vanishes (as we know from Theorem 8.1 it must) for p = q may 
be seen on differentiating the relation (p = (?o</'[o] + qi4'[i\ with respect to qi; 
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that it does not depend on the choice of gauge (p of order 1 follows from 
Section 7.4. 

With Pi = p^'^\x), etc., we want (i{,(p, q) [or, for the standard gauge choice, 
(i^(p,q)] to vanish in the limit as we approach the end-points — and +, for 
all densities p(-), q{-) of distributions in V. Conditions for validity will thus 
involve the behavior of p{x) and p'{x) at these end-points. 

For example, for the Hyvarinen rule, with gauge choice 4) = —^qf/qo^ we 
require 

(62) d,(p,q)=pof---')^0 

as we approach the end-points of X. (The same expression for df, arises 
if we use the standard gauge choice (j)* = q2 — ^qf/qo-, which in this case 
is equivalent to <j).) To ensure (62) we might require, for example, that, 
for all densities p{-) in V, limx^±p{x) = and linix^±p' (x) /p{x) is finite. 
However, this excludes the possibility that both p and q are normal densities 
on A" = R, even though, with this choice, db as given by (62) does vanish at 
iboo. Ehm and Gneiting (2010, 2012) described alternative conditions on V 
that do admit this case. 

In the ideal situation we will have a (strictly) concave 1-homogeneous cj), 
and a family V valid with respect to (p. Then the associated key local scoring 
rule S will be (strictly) proper. 

11. Transformation of the data. So far we have considered a variable X 
taking values in a real interval X, and have made essential use of the Eu- 
clidean structure of X to define probability densities, derivatives, etc. Taking 
a step backward, suppose we start with an abstract measurable sample space 
(the basic sample space) X* , a basic variable X* taking values in X* , and 
a collection V* of basic distributions for X* over X* . Without assuming any 
further structure, we can define a basic scoring rule S* : X* x — )■ M, and 
introduce the property of (strict) propriety, exactly as before. However, at 
this level of generality it is less straightforward to define what we should 
mean by saying that a basic scoring rule is local. To do this we proceed as 
follows. 

We suppose given a collection H = {^} of charts, where each ^ is an in- 
vertible measurable function from X* onto some open interval A" C M, and 
such that, for i^,.^ € H, the composition -.X^X is smooth and regular, 
that is, infinitely often differentiable with strictly positive first derivative. In 
other words, the basic space is a one-dimensional simply connected smooth 
manifold. 

Picking any specific chart ^ produces a concrete representation of the 
abstract basic structure, in terms of the real variable X := (^{X*), and, for 
any Q* eV*, the induced distribution Q for X on ^ C M [so that Q{A) = 
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Q*{^ ^{A)}]; we take V := {P : P* € V*}. Correspondingly, a basic function 
f* : X* X — 7> M (e.g., a scoring rule) is represented by / : A" x 'P — > M, such 
that /(x^Q) = /*(x*,Q*). _ _ _ 

Let be two_such charts, and X = iiX*), X = i{X*), etc. Then X = 
7(X), where 7 = ^^"^ is strictly increasing, and both 7 and 5 := 7""^ are 
smooth and regular. A given basic distribution Q* for X* can be represented 
either by the distribution Q, for X, or by Q, for X. We assume that Q 
has a density function, g(-), with respect to Lebesgue measure on X] then 
the density function q{-) oi Q with respect to Lebesgue measure on X will 
likewise exist, and, with x = j{x), we will have 

dx 

(63) q{x) = q{x) — = a{x)q{x) 

with a{x) := 7'(x)~^. An easy induction shows that we can express 

(64) g('=)(x)=Tfc(x,g(x),...,g('=)(x)), 
where has the form 

(65) Tk{x,qo,...,qk) = '^akr{x)qr 

r 

and the coefficients akr{x) satisfy a^rix) = unless < r < A;, aoo(a^) = a(x), 
and 

(66) Ofc+i .^(x) = a(x){4,,(x) + afc .r_i(x)}. 
In similar fashion we can express 

gW(x) = rfc(x,g(x),...,gW(x)) 

(67) 

= ^akr{x)q^''\x). 

r 

It readily follows from (64) and (67) that a basic function f*{x*,Q*) can 
be written, in the ^-representation, in the form f {x , q{x) , q' (x) , . . . ,q^^\x)) if 
and only if the analogous property holds in the ^-representation: f*{x*,Q*) = 
f{x,q{x),q'{x), . . . ,q^"^\x)). That is, the property of being m-local is inde- 
pendent of the particular representation used. When this property holds for 
one, and thus for all, representations, we can say that the basic function 
f*{x*,Q*) itself is m-local; a g-function / such that f*{x*,Q*) = f{x,q{x), 
q'{x),. . . , g(*")(x)) is the (^-representation of /*. We denote the vector space 
of all local basic functions by V*. 

At a more abstract level, motivated by (64) and (65), we define variables 

X :=7(x), 

(68) 

Qk ■=Tk{x,qo,...,qk) =^akr{x)qr. 



26 



M. PARRY, A. P. DAWID AND S. LAURITZEN 



Inversely, we will then have 

X = 5{x), 

(69) 

Qk = Tk{x, go, • • • Wfc) = X] '^kr{x)qr- 

r 

Using (69), any g-function of order m, /(x, go, • • • , 9m) can be rewritten 
as f{x,qQ, . . . ,qjn)- If /* S V* has ^- and .^-representations / and /, respec- 
tively, then / can be obtained by reexpressing / in this way. Since Tk is 
1-homogeneous, / is homogeneous of degree h in the q^s if and only if / is 
homogeneous of degree h in the g's. In this case we may term the underly- 
ing local basic function /* S V* h -homogeneous. Likewise, since (for fixed x 
or x) the functions and are linear, / is (strictly) concave in the g's if 
and only if / is (strictly) concave in the g's — in which case we may term /* 
itself (strictly) concave. 

11.1. Invariant operators. The linear differential operators D and L have 
only been defined in terms of a specific representation of the problem on the 
real line, as determined by some chart Applying these definitions starting 
from a different real representation, determined by a chart ^, we will obtain 
possibly different operators, D, L. The following results relate these. We 
need the following lemma: 

Lemma 11.1. We have 

r k ^'^ 

Proof. Equation (70) follows immediately from (68). For (71) we have 
d _dx d ^ dqj^ d 
dx dx dx ^-^ dx dqj. 

But dx/dx = a~^, while from (68) 

dqk _sr ' 

r 

SO (71) follows. □ 

We now show that if / and / are, respectively, the ^ and ^ representations 
of the same basic function /*, then Df is the ^-representation of the basic 
function whose ^-representation is a{x)Df. Note that the function a, and 
hence the basic function so represented, will depend on the charts considered. 
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Theorem 11.2. It holds that 



D = a{x)D. 



Proof. Informally, we observe that D corresponds to the total deriva- 
tive d/dx and D to d/dx. Thus we expect D = (dx/dx)D. 
More formally, we have 



On applying Lemma 11.1 this reduces to aD. □ 

Since, by the transformation rule (63) for densities, % = a{x)qQ, we thus 
have 

Corollary 11.3. It holds that = Qq^D. 

It follows from Corollary 11.3 that, for /* G V*, there exists g* S V* such 
that, in any representation, g = qQ^Df. This shows the existence of an "in- 
variant" linear operator D* on V* such that, in any representation, D* f* is 
represented by qQ^Df. 

We next show that there exists an invariant linear operator L* on V* 
such that, in any representation, if /* is represented by /, then L* f* is 
represented by Lf. 

Theorem 11.4. We have L = L. 

Proof. On substituting (63) and (70) into the definition (19) of L and 
rearranging, we obtain 





on using (68). From (66) this is 




(72) 




where the operator Af^ is given by 



r 
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The theorem wiU thus be proved if we can show Ak = D ao; that is, using 
Theorem 11.2, we have to show: 

(73) Hk : {aDfao = J^(-l)^^--D'-afc,o. 

r 

We prove (73) by induction on k. First, Hq holds since both sides reduce to 
ao. Now suppose Hk holds. Then 

{aOf'^^ao = {aDfaDao 

(74) 

= ^(-l)^-'^D'^aA,.,Dao. 

r 

But QkrD = {Dakr°) — ^° C''^) becomes 

r r 

which can be written as 

r 

and on applying (66) we have verified Hk+i- □ 

11.2. Invariance of scoring rule. On applying Theorem 11.4, we see that 
the general homogeneous key local scoring rule, as given by (ii) of Theo- 
rem 6.4, can be expressed invariantly as 

S*ix*,Q*) = {I-L*)g*{x*,Q*), 

where g* is a 0-homogeneous local basic function. Then, in any represen- 
tation, we will have S{x,Q) = (/ — L)g{x,Q). We may thus say that the 
scoring rule S* is derived from the local basic function g* . In particular the 
expected score S* {P* ,Q*), and consequently the entropy function H*{P*) 
and the divergence function d* {P* ,Q*), are fully determined by the basic 
function g* , independently of how that may be represented. 

In fact more is true: the individual components So{P,Q), S+{P,Q), 
S-{P,Q) of S{P,Q), in the decomposition (45) arising from the integration 
by parts, each correspond to an invariant expression Sq{P*, Q*), (P*, Q*), 
S'^{P* ,Q*) (and similarly for the decompositions of H and d). 

We show this first for the integral term 5*0. We need the following lemma, 
showing that the expression 11 := YlrPr'^/'^Qr represents an invariant op- 
erator n* (depending on a distribution P* , and acting on a function of 
a distribution Q* , both defined over V*). 

Lemma 11.5. We have 



d ^ d 
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Proof. Using (70) and (68), we have 

d ^ d 



d 



k r=0 

^_ d 

k 



Now consider expression (46) for So{P,Q), where, in accordance with (ii) 
and (iii) of Theorem 6.4, (j) = q^g, with g the representation of a local ba- 
sic function g* . The integrand can then be written as (po + Qo^)gj whence 
So{P, Q) = Fipg + FiQCilg) =Fip*g* + EQ«(n*5*) — which thus has an invari- 
ant form, Sq{P* ,Q*), independently of the representation employed. 

We next demonstrate the corresponding property for the boundary term Sb- 

Theorem 11.6. We have 

(75) '^PrBr = ^PkBkao. 

r k 

Proof. On substituting (70), using (65) and Theorem 11.2, and rear- 
ranging, the statement of the theorem becomes 

r m>r+lk=r+l ^"^ 

m— 1 n 

r m>r+l k=r 

The theorem will thus be proved if we can show 

m m—1 
k=r+l k=r 

for all m > r + 1. We prove (76) by induction on m. First, holds smce 

the left-hand side reduces to ar+i^r+i° and the right-hand side reduces to 
Urrao, and these are equal by (66). Now suppose H^^ holds. Then 



(77) Yl {-^f-^-'D^-^-'arnkDao 

k=r+l 

m—1 

(78) = ^ akr{-ir~^~^{aD)'^^^~^aDao. 



k=r 
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But amkD = {Damk°) — ci-'mk°^ that, on applying (66), the left-hand side 
of (78) becomes 

m+l 
k=r+l 

The right-hand side of (78) straightforwardly becomes 

m 

amr-ao -Y.akri-ir-HaDr-'a^, 

k=r 

and we have thus verified H„i+i- □ 

It follows from (75) that YlrPr^r-Qo = "^rPr^rQo, which thus defines an 
invariant operator. Let now S* , with representations S, S, derive from the 
0- homogeneous basic function g* , with representations g, 'g. On using (48), 
in which (p = q^g, we get 

'S'fe(p,q) = "^PrBrqag = "YprBrqog = Sb{p,q). 

r r 

Hence by (47) the boundary contributions S±{P,Q) will be the same in all 
representations. 

Example 11.1 (Modified Hyvarinen rule). Take ^ = (0, oo), X = R, 
7(x) = Inx (so X = InX). Then a{x) = x and we find q^ = xqo, qi = xqo + 

x'^Qi , Q2 = XQo + ^x'^Qi + x^q2- _ 

Let the scoring rule in the ^-representation, S, be defined by the Hyvarinen 
formula: 

This derives from the function g = —^{qi/qo)^- 



2> 

Reexpressed in the ^-representation, we have 



(80) S{x,Q) = x 



'q"{x) 






q{x) 







q'ix) 1 
q{x) 2 



which itself derives from the ^-reexpression of 5, viz., g = —^(1 -|- xqi/q^)^ . 
That is, it is generated by </> = qog = — ^(?o — xqi — ^x'^qf/qQ. The simpler 
choice (/)* = — \x^q\/qQ is equivalent to 0, and thus generates an equivalent 
scoring rule, with the same divergence function; in fact, it simply eliminates 
the final term -|-^ in (80). This form of the scoring rule also appears in 
equation (28) of Hyvarinen (2007). 
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For this scoring rule, a class V* of distributions for the basic variable X* 
will be valid if, for P,Q £V*, Po{{Pi/Po) — {Qi/qo)} — )• as x ^ ±00, where 
these expressions are based on the ^-representation [in which S is given 
by (79)]. Reexpressing this in the ^-representation, we want 

(81) x'^PoKpi/po) - (qi/qo)} ^0 asx^Ooroo. 

At the lower end-point of r^, this condition is less restrictive than the 
corresponding condition (62) for the regular Hyvarinen scoring rule defined 
directly on X — although it becomes more restrictive at 00. 

In particular, suppose we consider the family £ of exponential densities: 

q{x\e) = 06-'^'' {x,9>0). 

For p,q £ £, condition (81) is satisfied, whereas (62) is not. If we tried to 
apply the unmodified Hyvarinen score (9) to estimate 9 in this model, we 
would obtain S{x, Qg) = ^0^, and (7) would then appear to yield the clearly 

nonsensical estimate 6 = 0. This is due to failure of the boundary condi- 
tions, so that the original Hyvarinen rule is not in fact proper in this case. 
The modified rule (80) is proper for this family, and yields the consistent 
estimator 2 X,/ • 

12. Discussion and further work. In this paper we have investigated lo- 
cal scoring rules only for the case that the sample space is an open interval 
on the real line. The general ideas extend to the case that the sample space is 
a simply-connected d-dimensional differentiable manifold. This raises chal- 
lenging new technical problems, but could deliver a fundamentally improved 
understanding and illuminate issues associated with boundary problems. 
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